Affordable Fault Tolerance Through Adaptation
نویسندگان
چکیده
Fault-tolerant programs are typically not only difficult to implement but also incur extra costs in terms of performance or resource consumption. Failures are typically relatively rare but the fault-tolerance overhead must be paid regardless if any failures occur during the program execution. This paper presents an approach that reduces the cost of fault-tolerance, namely, adaptations to a change in failure model. In particular, a program that assumes no failures (or only benign failures) is combined with a component that is responsible for detecting if failures occur and then switching to a fault-tolerant algorithm. Provided that the detection and adaptation mechanisms are not too expensive, this approach results in a program with smaller fault-tolerance overhead and thus a better performance than a traditional fault-tolerant program. Thus, the high cost of fault-tolerance is only paid when failures actually occur.
منابع مشابه
Delivering Affordable Fault-tolerance to Commodity Computer Systems
Delivering Affordable Fault-tolerance to Commodity Computer Systems by Shuguang Feng
متن کاملTHICA: A Fault E
The parallel processing power is now within our reach through the affordable cluster computing models. Fault tolerance along with the throughput improvement is the prime research challenge of cluster computing. A new task rearrangement of cluster nodes has been implemented to increase the degree of fault tolerance with the existing cluster computing model built on the top of MPI. The proposed m...
متن کاملFault Tolerance for Multiprocessor Systems Via Time Redundant Task Scheduling
Fault tolerance is often considered as a good additional feature for multiprocessor systems but nowadays it is becoming an essential attribute. Fault tolerance can be achieved by the use of dedicated customized hardware that may have the disadvantage of large cost. Another approach to fault tolerance is to exploit existing redundancy in multiprocessor systems via a task scheduling software stra...
متن کاملThe Chameleon Infrastructure for Adaptive, Software Implemented Fault Tolerance
This paper presents Chameleon, an adaptive software infrastructure for supporting different levels of availability requirements in a heterogeneous networked environment. Chameleon provides dependability through the use of ARMORs—Adaptive, Reconfigurable, and Mobile Objects for Reliability. Three broad classes of ARMORs are defined: Managers, Daemons, and Common ARMORs. Key concepts that support...
متن کاملImproving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کامل